Prosodic boundary information helps unsupervised word segmentation

نویسندگان

  • Bogdan Ludusan
  • Gabriel Synnaeve
  • Emmanuel Dupoux
چکیده

It is well known that prosodic information is used by infants in early language acquisition. In particular, prosodic boundaries have been shown to help infants with sentence and wordlevel segmentation. In this study, we extend an unsupervised method for word segmentation to include information about prosodic boundaries. The boundary information used was either derived from oracle data (handannotated), or extracted automatically with a system that employs only acoustic cues for boundary detection. The approach was tested on two different languages, English and Japanese, and the results show that boundary information helps word segmentation in both cases. The performance gain obtained for two typologically distinct languages shows the robustness of prosodic information for word segmentation. Furthermore, the improvements are not limited to the use of oracle information, similar performances being obtained also with automatically extracted boundaries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word segmentation in Persian continuous speech using F0 contour

Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...

متن کامل

The use of phrase-level prosodic information in lexical segmentation: evidence from word-spotting experiments in Korean.

This study investigated the role of phrase-level prosodic boundary information in word segmentation in Korean with two word-spotting experiments. In experiment 1, it was found that intonational cues alone helped listeners with lexical segmentation. Listeners paid more attention to local intonational cues (...H#L...) across the prosodic boundary than the intonational information within a prosodi...

متن کامل

The Role of Prosody and Speech Register in Word Segmentation: A Computational Modelling Perspective

This study explores the role of speech register and prosody for the task of word segmentation. Since these two factors are thought to play an important role in early language acquisition, we aim to quantify their contribution for this task. We study a Japanese corpus containing both infantand adult-directed speech and we apply four different word segmentation models, with and without knowledge ...

متن کامل

Word Boundary Information and Chinese Word Segmentation

Chinese word segmentation could be considered as a problem of word boundary recognition. Word boundary information plays a significant role in human language acquisition and automatic segmentation for Natural Language Processing (NLP). Extraction of word boundary information involves cognitive psychology, computational linguistics, and language education. Methods utilizing word boundary informa...

متن کامل

Prosodic Word Grouping in Mandarin TTS System

This paper reports the methodology and results of prosodic word grouping for a Mandarin TTS system developed by the Fujitsu Laboratories. In view of any inner prosodic word break will make speech unintelligible or unnatural, a new prosodic word grouping framework is proposed. The word segmentation result can be regarded as an initial prosodic word sequence with grids inserted into each word bou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015